54 research outputs found
Neural Class-Specific Regression for face verification
Face verification is a problem approached in the literature mainly using
nonlinear class-specific subspace learning techniques. While it has been shown
that kernel-based Class-Specific Discriminant Analysis is able to provide
excellent performance in small- and medium-scale face verification problems,
its application in today's large-scale problems is difficult due to its
training space and computational requirements. In this paper, generalizing our
previous work on kernel-based class-specific discriminant analysis, we show
that class-specific subspace learning can be cast as a regression problem. This
allows us to derive linear, (reduced) kernel and neural network-based
class-specific discriminant analysis methods using efficient batch and/or
iterative training schemes, suited for large-scale learning problems. We test
the performance of these methods in two datasets describing medium- and
large-scale face verification problems.Comment: 9 pages, 4 figure
Deep Multi-view Learning to Rank
We study the problem of learning to rank from multiple information sources.
Though multi-view learning and learning to rank have been studied extensively
leading to a wide range of applications, multi-view learning to rank as a
synergy of both topics has received little attention. The aim of the paper is
to propose a composite ranking method while keeping a close correlation with
the individual rankings simultaneously. We present a generic framework for
multi-view subspace learning to rank (MvSL2R), and two novel solutions are
introduced under the framework. The first solution captures information of
feature mappings from within each view as well as across views using
autoencoder-like networks. Novel feature embedding methods are formulated in
the optimization of multi-view unsupervised and discriminant autoencoders.
Moreover, we introduce an end-to-end solution to learning towards both the
joint ranking objective and the individual rankings. The proposed solution
enhances the joint ranking with minimum view-specific ranking loss, so that it
can achieve the maximum global view agreements in a single optimization
process. The proposed method is evaluated on three different ranking problems,
i.e. university ranking, multi-view lingual text ranking and image data
ranking, providing superior results compared to related methods.Comment: Published at IEEE TKD
Multi-view Data Analysis
Multi-view data analysis is a key technology for making effective decisions by leveraging information from multiple data sources. The process of data acquisition across various sensory modalities gives rise to the heterogeneous property of data. In my thesis, multi-view data representations are studied towards exploiting the enriched information encoded in different domains or feature types, and novel algorithms are formulated to enhance feature discriminability. Extracting informative data representation is a critical step in visual recognition and data mining tasks. Multi-view embeddings provide a new way of representation learning to bridge the semantic gap between the low-level observations and high-level human comprehensible knowledge beneļ¬tting from enriched information in multiple modalities.Recent advances on multi-view learning have introduced a new paradigm in jointly modeling cross-modal data. Subspace learning method, which extracts compact features by exploiting a common latent space and fuses multi-view information, has emerged proiminent among different categories of multi-view learning techniques. This thesis provides novel solutions in learning compact and discriminative multi-view data representations by exploiting the data structures in low dimensional subspace. We also demonstrate the performance of the learned representation scheme on a number of challenging tasks in recognition, retrieval and ranking problems.The major contribution of the thesis is a uniļ¬ed solution for subspace learning methods, which is extensible for multiple views, supervised learning, and non-linear transformations. Traditional statistical learning techniques including Canonical Correlation Analysis, Partial Least Square regression and Linear Discriminant Analysis are studied by constructing graphs of speciļ¬c forms under the same framework. Methods using non-linear transforms based on kernels and (deep) neural networks are derived, which lead to superior performance compared to the linear ones. A novel multi-view discriminant embedding method is proposed by taking the view difference into consideration. Secondly, a multiview nonparametric discriminant analysis method is introduced by exploiting the class boundary structure and discrepancy information of the available views. This allows for multiple projecion directions, by relaxing the Gaussian distribution assumption of related methods. Thirdly, we propose a composite ranking method by keeping a close correlation with the individual rankings for optimal rank fusion. We propose a multi-objective solution to ranking problems by capturing inter-view and intra-view information using autoencoderlike networks. Finally, a novel end-to-end solution is introduced to enhance joint ranking with minimum view-speciļ¬c ranking loss, so that we can achieve the maximum global view agreements within a single optimization process.In summary, this thesis aims to address the challenges in representing multi-view data across different tasks. The proposed solutions have shown superior performance in numerous tasks, including object recognition, cross-modal image retrieval, face recognition and object ranking
Learn from Incomplete Tactile Data: Tactile Representation Learning with Masked Autoencoders
The missing signal caused by the objects being occluded or an unstable sensor
is a common challenge during data collection. Such missing signals will
adversely affect the results obtained from the data, and this issue is observed
more frequently in robotic tactile perception. In tactile perception, due to
the limited working space and the dynamic environment, the contact between the
tactile sensor and the object is frequently insufficient and unstable, which
causes the partial loss of signals, thus leading to incomplete tactile data.
The tactile data will therefore contain fewer tactile cues with low information
density. In this paper, we propose a tactile representation learning method,
named TacMAE, based on Masked Autoencoder to address the problem of incomplete
tactile data in tactile perception. In our framework, a portion of the tactile
image is masked out to simulate the missing contact region. By reconstructing
the missing signals in the tactile image, the trained model can achieve a
high-level understanding of surface geometry and tactile properties from
limited tactile cues. The experimental results of tactile texture recognition
show that our proposed TacMAE can achieve a high recognition accuracy of 71.4%
in the zero-shot transfer and 85.8% after fine-tuning, which are 15.2% and 8.2%
higher than the results without using masked modeling. The extensive
experiments on YCB objects demonstrate the knowledge transferability of our
proposed method and the potential to improve efficiency in tactile exploration.Comment: This paper is accepted at IROS 202
Learn to Cluster Faces with Better Subgraphs
Face clustering can provide pseudo-labels to the massive unlabeled face data
and improve the performance of different face recognition models. The existing
clustering methods generally aggregate the features within subgraphs that are
often implemented based on a uniform threshold or a learned cutoff position.
This may reduce the recall of subgraphs and hence degrade the clustering
performance. This work proposed an efficient neighborhood-aware subgraph
adjustment method that can significantly reduce the noise and improve the
recall of the subgraphs, and hence can drive the distant nodes to converge
towards the same centers. More specifically, the proposed method consists of
two components, i.e. face embeddings enhancement using the embeddings from
neighbors, and enclosed subgraph construction of node pairs for structural
information extraction. The embeddings are combined to predict the linkage
probabilities for all node pairs to replace the cosine similarities to produce
new subgraphs that can be further used for aggregation of GCNs or other
clustering methods. The proposed method is validated through extensive
experiments against a range of clustering solutions using three benchmark
datasets and numerical results confirm that it outperforms the SOTA solutions
in terms of generalization capability
- ā¦